Outlying Subspace Detection for High- dimensional Data

نویسنده

  • Hai Wang
چکیده

Knowledge discovery in databases, commonly referred to as data mining, has attracted enormous research efforts from different domains such as database, statistics, artificial intelligence, data visualization, etc, in the past decade. Most of the research work in data mining such as clustering, association rules mining and classification focus on discovering the “large patterns” from databases (Ramaswamy, Rastogi & Shim, 2000). Yet, it is also important to explore the ``small patterns'' in databases that carry valuable information about the interesting abnormal regularities. Outlier detection is a research problem in “small-pattern” mining in databases. It aims at finding a specific number of objects that are considerably dissimilar, exceptional and inconsistent with respect to the majority records in an input database. Numerous research work in outlier detection has been proposed such as the distribution-based methods (Barnett& Lewis, 1994; Hawkins, 1980), the distance-based methods (Angiulli & Pizzuti, 2002; Knorr & Ng, 1998; Knorr & Ng, 1999; Ramaswamy, Rastogi & Shim, 2000, Wang, Zhang& Wang, 2005), the density-based methods (Breuning, Kriegel, Sander & Xu, 2000; Jin, Tung & Han, 2001; Tang, Chen, Fu & Cheung, 2002) and the clustering-based methods (Agrawal, Gehrke, Gunopulos & Raghavan, 1998; Ester, Kriegel, Sander & Xu, 1996; Hinneburg & Keim, 1998; Ng & Han, 1994; Sheikholeslami, Chatterjee & Zhang, 1999; Zhang, Whsu & Lee, 2005; Zhang, Ramakrishnan & Livny, 1996).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting Outlying Subspaces for High-Dimensional Data: A Heuristic Search Approach

In this paper, we identify a new task for studying the outlying degree of high-dimensional data, i.e. finding the subspaces (subset of features) in which given points are outliers, and propose a novel detection algorithm, called HighD Outlying subspace Detection (HighDOD). We measure the outlying degree of the point using the sum of distances between this point and its k nearest neighbors. Heur...

متن کامل

HOS-Miner: A System for Detecting Outlying Subspaces of High-dimensional Data

We identify a new and interesting high-dimensional outlier detection problem in this paper, that is, detecting the subspaces in which given data points are outliers. We call the subspaces in which a data point is an outlier as its Outlying Subspaces. In this paper, we will propose the prototype of a dynamic subspace search system, called HOS-Miner (HOS stands for High-dimensional Outlying Subsp...

متن کامل

Outlying Subspace Detection for High-Dimensional Data

Knowledge discovery in databases, commonly referred to as data mining, has attracted enormous research efforts from different domains such as databases, statistics, artificial intelligence, data visualization, and so forth in the past decade. Most of the research work in data mining such as clustering, association rules mining, and classification focus on discovering large patterns from databas...

متن کامل

Finding Key Knowledge Attribute Subspace of Outliers for High Dimensional Dataset

Detecting outliers is an important task in many applications. Since most applications possess high dimensional data, traditional outlier detecting methods will become inefficient in such cases. To solve the problem, we propose the concept of outlying reduction by extending attribute reduction in rough set theory. Additionally, we define the key knowledge attribute subspace (KKAS), which can pro...

متن کامل

A Web-based Interactive Data Visualization System for Outlier Subspace Analysis

Detecting outliers from high-dimensional data is a challenge task since outliers mainly reside in various lowdimensional subspaces of the data. To tackle this challenge, subspace analysis based outlier detection approach has been proposed recently. Detecting outlying subspaces in which a given data point is an outlier facilitates a better characterization process for detecting outliers for high...

متن کامل

Detecting High-Dimensional Outliers: the New Task, Algorithms and Performance

Outlier detection is a fundamental step in knowledge discovery in databases. With the increasing number of high-dimensional databases, existing outlier detection algorithms that work only in the context of full space are unable to effectively screen out informative outliers. This is because majority of these outliers exists only in subspaces. In this paper, we identify a new outlier detection t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006